排序方式: 共有57条查询结果,搜索用时 15 毫秒
1.
Network-on-chip (NoC) has rapidly become a promising alternative for complex system-on-chip architectures including recent multicore architectures. Additionally, optimizing NoC architectures with respect to different design objectives that are suitable for a particular application domain is crucial for achieving high-performance and energy-efficient customized solutions. Despite the fact that many researches have provided various solutions for different aspects of NoCs design, a comprehensive NoCs system solution has not emerged yet. This paper presents a novel methodology to provide a solution for complex on-chip communication problems to reduce power, latency and area overhead. Our proposed NoC communication architecture is based on setting up virtual source–destination paths between selected pairs of NoCs cores so that the packets belonging to distance nodes in the network can bypass intermediate routers while traveling through these virtual paths. In this scheme, the paths are constructed for an application based on its task-graph at the design time. After that, the run time scheduling mechanism is applied to improve the buffer management, virtual channel and switch allocation schemes and hence, the constructed paths are optimized dynamically. Moreover, in our design the router complexity and its overheads are reduced. Additionally, the suggested router has been implemented on Xilinx Virtex-5 FPGA family. The evaluation results captured by SPLASH-2 benchmark suite reveal that in comparison with the conventional NoC router, the proposed router takes 25% and 53% reduction in latency and energy, respectively besides 3.5% area overhead. Indeed, our experimental results demonstrate a significant reduction in the average packet latency and total power consumption with negligible area overhead. 相似文献
2.
V.A. Chouliaras Author Vitae V.M. Dwyer Author Vitae Author Vitae J.L. Nunez-Yanez Author Vitae Author Vitae K. Nakos Author Vitae Author Vitae 《Integration, the VLSI Journal》2008,41(1):135-152
This work presents a detailed case study in customizing a configurable, extensible, 32-bit RISC processor with vector/SIMD instruction extensions for the efficient execution of block-based video-coding algorithms utilizing a proprietary co-design environment. In addition to the default Full-Search motion estimation of the MPEG-2 Test Model 5, fourteen fast ME algorithms were implemented in both scalar and vector form. Results demonstrate a reduction of up to 68% in the dynamic instruction count of the full search-based encoder whereas the fast motion estimation algorithms achieved a reduction in instruction count of nearly 90%, both accelerated via three 128-bit vector/SIMD instructions when compared to the scalar, reference implementation of the standard. We address in detail the profiling, vectorization and the development of these vector instruction set extensions, discuss in depth the implementation of a parametric vector accelerator that implements these instructions and show the introduction of that accelerator into a 32-bit RISC processor pipeline, in a closely-coupled configuration. 相似文献
3.
Sophisticated on-chip interconnects using packet and circuit switching techniques were recently proposed as a solution to non-scalable shared-bus schemes currently used in Systems-on-Chip (SoCs) implementation. Different interconnect architectures have been studied and adapted for SoCs to achieve high throughput, low latency and energy consumption, and efficient silicon area. Recently, a new on-chip interconnect architecture by adapting the WK-recursive network topology structure has been introduced for SoCs. This paper analyses and compares the energy consumption and the area requirements of Wk-recursive network with five common on-chip interconnects, 2D Mesh, Ring, Spidergon, Fat-Tree and Butterfly Fat-Tree. We investigated the effects of load and traffic models and the obtained results show that the traffic models and load that ends processing elements has a direct effect on the energy consumption and area requirements. In these results, WK-recursive interconnect generally has a higher energy consumption and silicon area requirements in heavy traffic load. 相似文献
4.
Godson2H is a complex SoC (System-on-Chip) of Godson series, which is a 117mm2, 152 million transistors chip fabricated in 65nm CMOS LP/GP process technology. It integrates a 1GHz processor core and abundant high or low speed peripheral IO interfaces. To overcome on-chip-variation problems in deep submicron designs, many methods are adopted in clock tree, and PVT detectors are integrated for debug. To meet the low power constraints in different applications, most of state-of-the-art low power methods are used carefully, such as dynamic voltage and frequency scaling, power gating and aggressive multi-voltage design. 相似文献
5.
在电子产品设计过程中要充分考虑系统自身特点,根据需要设计并解决各个功能模块.本课题依据系统在设计和使用中涉及的各种状态及数据,结合现代电子技术与数据处理技术,提出适合于该硬笔书法练习系统的解决方案,并基于硬件系统实现各模块的功能,证明了方案的正确性及可实现性. 相似文献
6.
7.
近年来,使用多核SoC代替传统的单处理器系统,在提高系统并行性方面显示出了巨大的优势.本文在已有层次化总线结构MPSoC的基础上,研究多核SoC原型芯片可扩展性设计问题.在RTL级设计了上述平台,并用FPGA进行原型验证,以流水矩阵乘法为例研究其在不同工作负载下的加速比变化.实验结果表明,在6个处理器的情形下,循环次数为6次时加速比仅为4.10;随着循环次数增多,加速比可达5.48.研究表明多核层次化总线原型芯片的性能提升百分比以及面积增加百分比与处理器数目成正比.可以通过增加处理器的数目来提升MPSoC原型芯片的性能. 相似文献
8.
9.
当前主流片上总线协议—AHB存在访存带宽利用率较低的问题.本文基于SoC内DMA传输较多的特点,提出一种新的优化设计:在内存控制器内部增加MCS-DMA模块,并通过驱动程序将MCS-DMA模块与目标DMA传输绑定. 一方面实现数据预取,提升单个DMA传输时的总线带宽利用率;另一方面使访存请求在内存控制器内部流水化完成,提升多个DMA并发时的总线带宽利用率.将该设计应用到北大众志SK SoC后,单个DMA传输时的总线带宽利用率提升至100%,多个DMA并发时的总线带宽利用率从33.3%提升至85.5%,而芯片设计面积仅增加2.9%. 相似文献
10.
测试规划是SoC芯片测试中需要解决的一个重要问题。一种复用片上网络测试内嵌IP芯核的测试规划方法被用于限制测试模式下SoC芯片功耗不超出最大芯片功耗范围,消除测试资源共享所引起的冲突,达到减小测试时间的目的。提出了支持测试规划的无拥塞路由算法和测试扫描链优化配置方法。使用VHDL硬件描述语言实现了在FPGA芯片中可综合的二维Mesh片上网络测试平台,用于片上网络性能参数、路由算法以及基于片上网络的SoC芯片测试方法的分析评估。 相似文献